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Abstract 

We consider the problem of estimating covariance and precision matrices, 
and their associated discriminant coefficients, from normal data when the 
rank of the covariance matrix is strictly smaller than its dimension and 
the available sample size. Using unbiased risk estimation, we construct 
novel estimators by minimizing upper bounds on the difference in risk over 
several classes. Our proposal estimates are empirically demonstrated to offer 
substantial improvement over classical approaches. 

Keywords: Covariance matrix, precision matrix, discriminant function, 
LDA, unbiased risk estimator, Moore-Penrose inverse, singular normal, 
singular Wishart. 

2000 MSC: Primary 62C15; secondary 62F10, 62H12. 


1. Introduction 

With the recent explosion of high throughput data, much interest has 
arisen in applications where the number of feature parameters is greater 
than the sample size. In this situation, it is typically assumed that, despite 
their number, the underlying components are linearly independent, or in 
other words that their covariance matrix has full rank. However, little at¬ 
tention has been given to the situation where there is dependence between 
the components, that is, where the covariance matrix would be singular. 
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Recently, Tsukuma and Kubokawa (2015) investigated the problem of 
estimating the mean vector of a multivariate normal distribution when the 
unknown covariance matrix is singular. By deriving an unbiased risk esti¬ 
mator for the quadratic loss, they were able to give sufficient conditions for 
an estimator to dominate the maximum likelihood estimator. 

This article is concerned with the same model as ITsukuma and Kubokawal 
(2015), but we consider three different estimation problems. Unlike the mean 
estimation problem, all three estimation scenarios depend the second order 
moment of the distribution. In each case we provide decision-theoretic re¬ 
sults that lead to improved inference. The first task is the estimation of 
the singular covariance matrix itself, under an invariant squared loss. This 
problem was first considered in the full rank case by Haff (1980), and in 
the high-dimensional setting by Konno (2009). The second concern is the 
estimation of the Moore-Penrose pseudo-inverse of the covariance matrix, 
also known as the precision matrix, under the Frobenius loss. This problem 


was first considered in the full rank case by 

Haff 

(1977 

1979a 

) and in the 

high-dimensional setting by 

Kubokawa and Srivastava (2008). 



Finally, we consider the problem of estimating the discriminant coeffi¬ 
cient that arise in Linear Discriminant Analysis (LDA) under the squared 


loss, a problem first considered in the full rank case by Haff (1986) and Dey 


and Srinivasan (|1991 ). LDA is a standard method for classification when 


the number of observations n is much larger than the number of features 
p. If data follows p-variate normal distribution with the same covariance 
structure across the groups, it provides an asymptotically optimal classifi¬ 
cation rule, meaning that its misclassification error converges to Bayes risk. 
However, it was noted by Dudoit et al. (2002) that a naive implementa¬ 
tion of LDA for high-dimensional data provides poor classification results in 
comparison to alternative methods. A rigorous proof of this phenomenon in 
the case p 3> n is given by Bickel and Levina (2004). There are two main 
reasons for this. First, standard LDA uses the sample covariance matrix 
to estimate the covariance structure, and in high dimensional settings this 
results in a singular estimator. Secondly, by using all p features in classifica¬ 
tion, interpretation of the results becomes challenging. One of the popular 
approaches to deal with the singularity is to use the independence rule which 
overcomes the singularity problem of the sample covariance but ignores the 
dependency structure. This approach is very appealing because of its sim¬ 
plicity and was encouraged by the work of Bickel and Levina (2004), who 
showed it performs better than the standard LDA in a p S> re setting when 
the population matrix is full rank. Unfortunately, independence is only an 
approximation and it is unrealistic in most applications: for instance, in a 
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genomic context, gene interactions and low dimensional network structure 
are crucial for the understanding of biological processes. In this situation, 
one should aim for better estimators of the covariance matrix rather than 
relying on an independence structure that assumes a full rank population 
covariance matrix. Indeed, we will see in Section that using the diagonal 
of the sample covariance matrix is a poor strategy if the true covariance 
matrix is rank deficient. 

The presentation of our approach to these three estimation problems is 
divided as follows. The decision-theoretic results are described in Section 

For each of the three problems, we construct an appropriate unbiased 


estimator of the risk (URE) using Stein’s and Half’s lemmas (Stein, 1986 


|Haff[ 1979b; Tsukuma and Kubokawa, 2015). We then consider the class of 
estimator given by constant multiples of a naive estimator, and minimize an 
upper bound on the difference in risk to obtain estimators that dominate 
the naive estimator. Finally, we consider a larger class given by the sum 
of this estimator and an appropriate trace, and again minimize an upper 
bound on the risk to obtain a dominating estimator. 

In Section we investigate the amount of improvement provided by 
the proposed estimators through numerical study. Finally, proofs of the 
statements of Section [2] are provided in Section 


2. Estimation Results 


2.1. Model 


Our setting is similar to the one used in Tsukuma and Kubokawa (2015). 
We observe an n-sample Xi, ...,Xn identically and independently distributed 
from ap-dimensional multivariate normal distribution Np(^, S), where and 
S are unknown. However, the p-dimensional covariance matrix S is rank- 
deficient with respect to the dimension and the sample size, in the sense 
that 


r = rkS < min(n,p). 


( 2 . 1 ) 


The resulting singular multivariate normal distribution does not have a 
density with respect to the Lebesgue measure on but lives in the r- 
dimensional linear subspace spanned by the columns of S. More details can 
be found, for example, in Srivastava and Khatri (1979 Section 2.1). 

Define the n x p data matrix X = (Xi, ...,XpY. The sample covariance 
matrix S = {X — lnX^Y{X — l„X*)/n then follows a Wishart distribution 
Wp{n — 1, S/n) with n — 1 degrees of freedom. Since S is rank-deficient, it 
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is singular in the terminology of Srivastava and Khatri (1979, Section 3.1). 
We warn the reader that the expression “singular Wishart” has also been 
used in the literature to describe the different situation where the covariance 
is positive-dehnite and the dimension exceeds the degrees of freedom, as in 
Srivastava (2003). Let S = OiLO\ denote the reduced spectral decomposi¬ 
tion of S', where L = diag(/i, denote the r non-zero eigenvalues and 

Oi is p X r semi-orthogonal. 

In this situation, neither S nor S are invertible. Since inverses of covari¬ 
ance matrix are of considerable interest in multivariate statistical analysis, 
some generalized inverse of these quantities is desirable. In this article, we 
will focus on the Moore-Penrose pseudoinverse, which will be denoted A'^ for 


a matrix A. Dehnitions and theoretical properties can be found in Harville 


(1997, Chapter 20). 


The singular multivariate normal model is amenable to decision-theoretic 
analysis through a key insight of Tsukuma and Kubokawa (2015, Section 
2.2). The authors proved that when (2.1) holds, the subspace spanned 


by the sample covariance matrix is almost surely constant and matches 
the subspace spanned the true covariance matrix, in the sense that the 
remarkable identity holds 


SS+ = SS+. 


( 2 . 2 ) 


This fact will be repeatedly used in the Section proofs and is essential to 
our derivations. 

Let us now turn our attention to the three problems we wish to solve. 
In terms of the notation introduced above, these are: 


Covariance matrix estimation. The estimation of S under the invariant 
squared loss L(S, S) = tr[(SS''' — Lp)^]. 

Precision matrix estimation. The estimation of under the Frobenius loss 
L(S+,S+) = ||S+-S+|||. 

Discriminant coefficient estimation. The estimation oi rj = under the 
square loss L{fi, rf) = \\fi — 

The traditional estimators for p and S are the sample mean and covari¬ 
ance {X,S), which suggests the corresponding naive estimators S, S~^ and 
for each respective problem. In the next three subsections we will see 
traditional estimators are not admissible and improved estimators will be 
developed. 


4 
















2.2. Covariance matrix estimation 

The standard estimator for a covariance matrix is the sample covariance 
matrix S. An alternative is the unbiased estimator which corrects 

for the loss in degrees of freedom from not knowing //. We will look for 
estimators that improve over these benchmarks and study their performance. 

We first show that an unbiased estimator of the risk holds for orthogo¬ 
nally invariant estimators, that is, estimators of the form S = with 

T = diag('0i, •••, tpr) twice-differentiable functions of L = diag(Zi,..., C). 

Theorem 1 (Unbiased risk estimation for singular covariance matrices). 
Let 1 < r < n — 1 and define 
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(2.3) 


(2.4) 


Let us now consider estimators that are proportional to the sample co- 
variance matrix, that is, of the form aS for a constant. The following result 
provides the optimal proportionality factor. 


Proposition 1. Let 1 < r < n — 1. The optimal estimator ofT, of the form 
aS for a G M a deterministic constant is Tjhfi = 


E 




(n — l)r 

= P -^- 

n -|- r 
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In particular T^hfi dominates S, which itself dominates 


Thus and S are inadmissible. We can further extend this result by 

considering a larger class of estimators of the form [S + tSS~^ tr“^(S'+)] 
for t constant. Estimators of this shape were first considered by Haff| pTSO] ). 
Although computing the exact risk of these estimators is difficult, it is pos¬ 
sible to bound the difference in risk with the one of Shfi as follows. 


Proposition 2. Let 1 < r < n — 4. Then the risk of estimators of the form 
[‘S' + tSS~^ for t G M can be bounded by 


E 


-(P 


tr{ 

+ 


< E 




-In 


{n - r){n - r + 2)^2 _ ^ (n - r)(r - 1) ^' 


(n + r)^ 


(n -I- r)^ 


E 


tr{S^‘^) 

_tr^{S+) 


(2.5) 


The constant that minimizes this upper bound is t = . When r > 1, 


the estimator Thf2 = 


s + ^;:E^^SS+tr-\S+) 


dominates Hhfi- 


Thus Shfi is itself inadmissible for r > 1. Although this result does 
not show Shf 2 optimal within the class, the estimator is likely to have good 
overall risk properties. 


2.3. Precision matrix estimation 

A standard estimator for a singular precision matrix is the Moore-Penrose 


pseudoinverse of the sample covariance matrix S~^. Note that by Muir head 
( 1982[ Page 97, Equation (12)) we have 


E[S+] = 


n 


n — r — 2 


for n — r — 2 > 0. Thus in this case an alternative could be the unbiased 
estimator ^0 look for estimators that improve over these 

benchmarks and study their performance. 

We first show that an unbiased estimator of the risk holds for orthogo¬ 
nally invariant estimators, that is, estimators of the form S'*' = 0i'^0\ with 
T = diag('i/’i,..., ifr) twice-differentiable functions of L = diag(fi,..., Ir). 

Theorem 2 (Unbiased risk estimation for singular precision matrices). Let 
1 < r < n — 1. Assume the regularity condition 
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n — r — 2 dfjk 1 V'fc ~ V’fe 
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Let us now consider estimators that are proportional to the Moore- 
Penrose inverse of the sample covariance matrix, that is, of the form aS~^ 
for a constant. The following optimality result holds over this class. 

Proposition 3. Let 1 < r < n — 5. The risk of estimators of the form aS~^ 
for a < can be bounded in terms of the risk of by 


E[||a5+ -S+lll.] < E 


n — r — 2 


S+ 


n 


S+ 


2 ' 
F 


+ ( a — 


n — r — 2 


n 


a - ^ ] E[tr(5+2)] . 


n 


( 2 . 6 ) 


The constant that minimizes this upper bound is a = ^ ^ , and the cor¬ 
responding estimator = 'B=i^g+ dominates which itself 

dominates . 


Thus 


n—r—2 


S~^ and S'"*" are inadmissible. Note that our bound on the risk 


n—r—2. 


presumably, estimators aS~^ with a > 


n—r—2 


do 


only holds for a < 

not dominate but we have not been able to prove this hypothesis. 

In any case, we can further extend this result by considering a larger 
class of estimators of the form [5+ +1 5'5''*~tr~^(6')] for t constant. 

Estimators of this form were first considered by Efron and Morris (1976). It 
is possible to bound the difference in risk with the one of as follows. 


Proposition 4. Let 1 < r < n — 5. The risk of estimators of the form 
[5"'“ +1 for t G M can be bounded in terms of the 

risk through 
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The constant that minimizes this upper bound is t = and the cor¬ 


responding estimator S 

y+ 

^EMl- 


n—r—4 


EM2 


+ 2-^, SS+tr-\S) 


dominates 


Thus is itself inadmissible. Again, although these results do not 

show and optimal within their classes, they are likely to possess 

good overall risk properties. 


2.4- Discriminant coefficients estimation 

A standard estimator for a singular discriminant coefficient is X. Note 
that since X and S are independent, we have 

E[5+A] =- 

n — r — 2 

for n — r — 2 > 0. Thus in this case an alternative could be the unbiased 
estimator We will look for estimators that improve over these 

benchmarks and study their performance. 

We first show that an unbiased estimator of the risk holds for estimators 
of the form g = Oi^O\X with T = diag(^/;i,..., V’r) twice-differentiable 
functions of L = diag(/i,..., Ir)- 

Theorem 3 (Unbiased risk estimation for singular discriminant coefficients). 
Let T* = diag{'4)\, with 


_ n-r -2 2 i ^ V’fc - 

^ n Ik n dlk Ik-lb ' 

b^k 

Assume the regularity conditions 
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Then 
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- E 
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Let us now consider estimators that are proportional to the naive esti¬ 
mator, that is, of the form aS'^X for a constant. The following optimality 
result holds over this class. 



Proposition 5. Let 1 < r < n — 5. The risk of estimators of the form 
aS~^X for a < can be bounded in terms of the risk of S^X by 


E 


aS+X - 7] 


< E 




n 


+ ( a — 


n — r — 2 


n 


n — r — A 


a — 


n 


E( X 


( 2 . 8 ) 


The constant that minimizes this upper bound is a = and the corre¬ 
sponding estimator fjTKi = S~^X dominates S~^X, which itself 

dominates S^X. 


Thus 


n—r—2 . 


-S^ and are inadmissible. Again, note that our bound on 
the risk only holds on the subset a < . Presumably, estimators aS~^ 

with a > do not dominate but we have not been able to 

prove this result. 

We can further extend this result by considering a larger class of estima¬ 
tors of the form [*S'+ + 1 55+tr“^(S')] X for t constant. Estimators of 

this form were first considered by Dey and Srinivasan (1991). It is possible 
to bound the difference in risk with the one of r/TKi as follows. 


Proposition 6. Let 1 < r < n — 5. The risk of estimators of the form 
fjt = [*S'^ + t X for t G M can be bounded in terms of the 

risk ofrjTKi = S~^X through 


E[||r)t- 


12 ^ 


< E 


hTKl 


+ 


(n — r — 3) 




2(r + l)t + (n — r — 3)t^ 


E 


Ltr(5)J 


(2.9) 


The constant that minimizes this upper bound is t = — , 

responding estimator fjTK2 = ^~n~^ ~ „!(^i3 <S'5'^tr~^(5) 

VTKl- 


and the cor- 
X dominates 


Thus r/TKi is itself inadmissible. One again, although these results do 
not show r^TKi and 7 )tk2 optimal within their classes, they are likely to have 
good overall risk properties. 


3. Numerical study 

In this section we investigate the risk performance of the proposed es¬ 
timator for covariance, precision and discriminant coefficients estimation 
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through two simulation studies. We also consider the performance of the 
diagonal of the sample covariance matrix, diag(S'). In various applications, 
using this estimator is a popular approach to overcome the singularity prob¬ 
lem of the sample covariance. Although it ignores the dependency structure, 
this estimator is appealing because of its simplicity, and was suggested by 
the results of Bickel and Levina (2004). 


3.1. Autoregressive simulation 

We let in,p) be (150,100), (200,100), (200,150) and (250,150). For each 
r from 1 to (n — 4) A p, we constructed the true covariance matrix S from 
an autoregressive structure with coefficient 0.9 and set its p — r smallest 
eigenvalues to zero to create a rank r matrix, as described in Algorithm 
We then randomly generated 1,000 replications from a multivariate normal 
distribution with mean // = (!,...,!) and singularized autoregressive covari¬ 
ance S, and computed the resulting sample covariance matrix S = X^Xjn. 


Algorithm 1: Algorithm for generating S 
Data: p, r 
Result: S 

for i,j E {1, ...,p} do 
I Eij = 0.5l*-^'l 

end 

for A: E {r -|- 1, ...,p} do 

I Afc(S) = 0 

end 


For the covariance matrix estimation problem, we computed the Percent¬ 
age Reduction In Average Loss (PRIAL) with respect to in invariant 

squared loss L(S, E) = tr[(EE'*'—Jp)^] for four estimators. The first three are 
the estimators S, Shfi = and Ehf 2 = ^ [<5 + ;^z^5'5+tr"^(S'+)] 


considered in Subsection 12.21 We also included as fourth estimator the di¬ 
agonal of the sample covariance matrix diag(5'). The simulation results are 
given in Figure[^ We notice that Ehfi and Ehf 2 behave similarly, and both 
improve substantially on S, while the diagonal estimator does much worse. 

Similarly, for the precision matrix estimation problem, we estimated the 
PRIAL with respect to S~^ in the Frobenius loss L(T,~^, E"'') = ||S''' — E'''|||. 
for four estimators. The first three are the estimators Eemi = 

n-r-Ag+ SEM 2 = [S~^ + 2 »S'5'*~tr~^ (5)] from Subsection 


2.3 The fourth one is the inverse of the diagonal of the sample covariance 
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matrix, diag(5')“^. The simulation results are given in Figure]^ We can 
see that all first three estimators improve substantially over S~^, but do not 
differ significantly in risk. In contrast, the diagonal estimator performs well 
when the true matrix is almost full rank, but becomes worse and worse for 
smaller covariance ranks. 

Finally, for the discriminant coefficient estimation problem, we estimated 
the PRIAL with respect to in the square loss L{f), r]) = ||??—??||2 for four 
estimators. The first three estimators are f/xKi = X 

and 77TK2 = ~ A, which were considered in Sub- 

The fourth one is the estimator diag(S')“^A, 


section 


2.4 


which has been 

considered in linear discriminant analysis when p > n. The simulation re¬ 
sults are given in Figure In this case again, all first three estimators have 
similar risk and substantially improve on the naive estimator, S~^X, while 
the diagonal estimator is acceptable only when the true covariance matrix 
is almost full rank and quite bad otherwise. 

3.2. NASDAQ-100 simulation 

To explore more realistic designs than an autoregressive covariance ma¬ 
trix, we also considered a setting where the true covariance matrix was 
constructed from real data. 

The NASDAQ-100 is a stock market index composed of the hundred 
largest non-financial companies on the NASDAQ. As of 2015, this is com¬ 
posed of 107 securities, since some companies offer several classes of stock. 
We computed the net daily returns of these assets up to March 6, 2015. 
The newest security is Liberty Media Corp Series C (LMCK), which was 
issued to series A and B shareholders as dividend on July 7, 2014. To avoid 
missing data issues, we took this date as the initial time point. This yielded 
a sample size of 167 trading days. From this data we computed a 107 x 107 
sample covariance matrix of the NASDAQ-100 returns. 

We then proceeded with the risk simulation as follows. For every r from 
1 to (n — 4) Ap, the true covariance matrix S was defined as the NASDAQ- 
100 sample covariance matrix with its p — r smallest eigenvalues set to zero. 
We then randomly generated 1,000 replications from a multivariate nor¬ 
mal distribution with mean p = (1 ,..., 1) and singular covariance S, and 
computed the resulting sample covariance matrix S = X^X/n. 

For each of the three estimation problems, we computed the PRIAL as 
The simulation results are given in Figure]^ The results 


in Subsection 3.1 


appear similar to the singularized autoregressive setting. 
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Figure 1: PRIAL of aS, Shfi, Shf 2 and diag(5') with respect to for 

estimating E in invariant squared loss. 


4. Discussion 


An application of the Tsukuma and Kubokawa (2015) technique devel 


oped in Subsection |2 .1 1 allows in essence to reduce the dimension from p to r. 
Since r < min(n,p), this in effect turns the problem into a classical setting 
where the sample size is greater than the dimension, and allows for classical 
proof techniques to be applied. 

An interesting extension is the setting where n < r < p. In that case, an 
adaptation of the method would yield a high-dimensional context where the 
true covariance matrix is full rank, but the sample size n is still smaller than 
the dimension p. Recent work, for example by Konno (2009), could allow 


the construction of improved estimators analogous to the ones presented in 
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Figure 2: PRIAL of ^ diag(5) ^ with respect to 

S~^ for estimating S"*" in Frobenius loss. 


this article. 

Recent attention has been given to the notion of the effective rank of 
a matrix r{A) = tr(A)/||A|| 2 , developed by Vershynin (2010)and applied in 
the study of spiked covariance matrices in Bunea and Xiao (2015). Singular 
covariance matrices can be regarded as a boundary case of spiked matrices 
where the noise equals zero. In that regard, it is interesting to notice that 
the quantity tr(5'''^)/tr^(S'+) that appear in inequality (2.5) is related to 
the effective rank of S~^ through the inequality 


tr(5+2) 

tr2(5+) 


< r(S’+) < 


tr(S'+2) 

tr2(5+)’ 


The presence of this quantity is likely connected to the orthogonal invariance 
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Figure 3: PRIAL of 


n—r—2 


to S~^X for estimating r/ = in squared loss. 


S~^X, f/xK 2 diag(S') ^X with respect 


of the loss function. 

The some of the key results in Bickel and Levina (2004) can be extended 
to the case where rkS = r < p. Suppose S is known and let ei and 62 
equal the limiting Bayes risk of the classification rule using S"*" and diag(S), 
respectively. By an application of an extended Kantorovich inequality for 
generalized inverses, developed by Liu and Neudecker (1997), it can be shown 
that 62 < j>(ei)), where $ is the Gaussian survival function and Kr = 

Ai/Ar with Ai > A 2 > • • • > Ar the non-zero eigenvalues of (diag(S))“^S. 
In the setting of Bickel and Levina (2004), S is assumed to be full rank so 
that the limiting Bayes risk of the classification rule using diag(S) is close 
to optimal. However, in the rank deficient case Kq = 00 for q > r, which 
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Figure 4: PRIAL for the singularized NASDAQ-100 covariance matrix in 
the three estimation tasks. 


implies that the diagonal rule give rise to a procedure that is no better 
than random guessing, that is with 62 = 1/2. This behavior is evident from 
Figures and In the case where the rank is close to p, the risk of the 
diagonal based discriminant estimator is close to the improved estimates, 
however, as the rank of S declines from p the risk properties of the diagonal 
based discriminant estimator become inferior. 

Finally, in applications where a singular covariance matrix is unlikely 
but a low-dimensional approximation is desired, it might be beneficial to 
use one of the estimators proposed in this article and cross-validate the 
rank r on the task to accomplish. For example, a mean-variance portfolio 
optimization problem could use ^^^2 as precision matrix estimate, with 
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rank r cross-validated on some validation set. To the best of our knowledge, 
this methodology has no theoretical grounding but might nevertheless prove 
useful in some high-dimensional problems. 


5. Proofs 


5.1. Preliminaries 

Before presenting the proofs of the statements from Section we explain 
the techniques employed by Tsukuma and Kubokawa| ( |2015[ ) to work around 
the singularity of the covariates in the model. Define the sample mean and 
covariance matrix to be 


X = X'ln/n _ _ ~Np(;U,S/n), 

5 = [X - InX'YlX - lnX']/n ~ Wp{n - 1, S/n). 


Since S has rank r, we can factorize it as S = BB^ for some full rank 
p X r matrix B. Let H = B[B^B)~^/‘^ and D = B^B - then H is p x r 
semi-orthogonal = Ij. and HH^ = SS'*', D is r x r invertible, S = 

and S+ = Since S is rank deficient, there must be a 

Z ~ N„^r(0, Ir) such that X = + ZB^, and therefore we can write X = 

pZ(B^Bfl‘^{B^B)-^l‘^B^ = InP^ + YH* for Y = ZVt^!’^ ~ iV„,^(0,D). 
Define then 

Y = Y^lnln_ _ ~N^(0,D/n), 

T = [Y - lnY^]'[Y - lnY^]/n ~ Wr(n - 1, D/n). 


Notice how T is full rank, since r < n — 1, in contrast with S. Using 
X = + YH^, we can see that these constructions are related to X and 

S through 


X = p + HY, 


S = HTW 


Recall that SS~^ = SS'*' almost surely, from Equation (2.2). Since S 
has rank r < p, there must be a p x r semi-orthogonal matrix Oi such 
that 0\0i = Ir, OiO\ = SS+ almost surely and S = OiLO^ for L = 
diag(Ai(S),..., Ar.(5)). The r xr matrix U = H^Oi is easily seen to be 
orthogonal, and so by T = H^SH = H^OiLOiH^ = ULU^, we see that T 
and S must share the same r non-zero eigenvalues, i.e. \i{S) = Xi{T). 

These constructions and facts form the basis of our risk estimation proce¬ 
dures and the notation will be repeatedly used in the following subsections. 
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5.2. Proofs of Subsection 2.2 

Proof of Theorem^ Since T and S share the same non-zero eigenvalues, we 
can regard 'h as a function of T ~ Wr{n — 1, ff/n) only. Since r < n — 1 and 
is full rank, we can apply Lemma 1 and 2 of Chetelat an d Wells (20^) 


to = U'^U^. On that result, one can also consult ISheena 


Theorem 4.1), and in the singular case Kubokawa and Srivastava 


(1995 


(2008 


Proposition 2.1) and Konno (2009, Theorem 2.4). In any case, we get 

■(^[SS+ - = E p- 2tr(^S+E) -h tr(^E+ES+E) 

= P[{p-r)+r-2tT{PL-^U^U') tr(fl"^[/4'C/'0"^f7TC/')] 


E 


= E 


= E 


p - r + tr [U^U'n-^ -IrY') 


^p-r) + r+ + 

nf^yk-lb 


n 


1 

-E 


n-rs 


under the regularity conditions 


E 


E 


n — r — 2 'ipk 2 dipk 1 f’k ~ 'f’b 

^ n Ih n ^ dlh n y h — L 

k=l k=l ” 


< oo. 


n — r — 2 '0^ -|- 20fc ^ 2 

n Ik n ^ 

k=l k=l 


E 


1 

+-E 

n ^ 


-k 2ipk - 0b - 20b 


k^b 




difl + 20fc 

dlk 

< oo and E 


E 

Lfc=l 


0fc + 20fc 


^k 


< OO. 


But these are satisfied by Inequalities (2.3). This concludes the proof. □ 
Proof of Proposition^ Let us apply the results of Theorem We have 

0fc — ^^kt SO 


i’k = 


n — r — 2 4 2 

- a H —a -|— a{r — 1) — 2 

n n n 


Cilk 


n + r 


n 


a — 2 


ulk’ 
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Then the unbiased risk estimator (2.4) equals 


U = p + 

= p + 


n-r-2^rk . , 1 V'fc - V'fe 


n 

n — r — 2 


Ik n dlk n Ik — lb 

k=i k=i ° 


+ 


n 

1 


n + r 


a — 2 


n 


2 

ar -\— 
n 


n + r 


a — 2 


n 


ar 


n 


n + r 


n 


a — 2 


ar{r — 1) 


(n — l)r (n—l)(n + r)r , 
= p-2^ -^a+ ^- ^-a^. 


n 




Clearly, E 


U 


^{n-l)r I (n-l)(n+r)r „2 
' ~ ^® “I - ::ri -® 


< oo. Similarly, 


E 


^n-r - 2ij}k , 2^ dipk , 1 V'fc - V’b 

p+y -^ + > 

/ ^ 71 li 71 / ^ f)l 1 71 / ^ 


k=l 

= E 


^ dlk ^k-h 


in — r — 2)r 2r r(r — 1) 

p -\- ^ ^.-_- -.i(i -|- — d -|-- d 

n n n 


(n — l)r 
p^- - 


E 


E 

Lfe=i 


rk 


Ik 


n 


= r 


< oo. 


n + r 


a — 2 


n 


< oo, E 


E 

U=i 


il^k 


Ik 


= ra < oo. 


Thus the regularity conditions of Theorem are satisfied and 


E 


tr(^[SS+= E[17] = p - + 


(n — l)r (n—l)(n + r)r 2 


a . 




But this is minimized when a = In particular, notice that since 

n > r + 1 = 2, 


E 


tr([SHFiE+-/p]') 


= p 


(n — l)r 


<p- 


n + r 
(n — r)(n — l)r 




in — r — 2)r „ 

<p-- -=E 

n — 1 


= E[tr([5S+-/p]') 


so Shfi dominates S, which dominates as desired. 


□ 
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Proof of Proposition Again, let us apply the results of Theorem Here 
V’fc = ^[k + ^/tr(5+)], so using that = i 2 tp\s+) 


rk = 


n-r -2ipk , 4 df^k , - A ^ 

b^k 


n 




n 


n + r 


n 

n + r 
n 


n — r — 2 


n + r 


1 + 


t 


+ 


lktr{S+) 
2(r-l) 


+ 


n + r 


V’fc 
1 + 


lltT^{S+) 


1 + 


-h + 


n + r 
n — r — 2 t 


- 2 



Li M 


r tr(5+)J 


+ 


n 


+ r 4tr(5+) n + r 


- 2 


Ik + 


tr(5+) 


- 2 


r + 1 1 


n + r 

n — r — 2 

+ 


+ 


n + rtr(S'+) n + r/fctr^(S'+) J n + r 


nt 


+ 


nt 


n + r 4ti'^(S'+) n + r/|tr^(S'+) J n + r 
Let us now compute the terms in the URE. We find for the first term: 


n — r — 2 


n 


E 

k=l 


^k 


n — r — 2 


n — r — 2 

+ -E 

k=l 


+ 


n 

n — r — 2 


n 


- 2 


E 


n 


k=l 
r + 1 


n + r 

1 


+ 


n 


+ r 4tr ('S'"*') n + r/^tr2(S'+) J n + r 


nt 


n 


E 

k=l 


n — r — 2 


+ 


n + r /|tr^(S'+) n + r/|tr^(5+) J n + r 


nt 


n — r — 2 n — r — 2 

-r H- 

n + r 


n + r 
n — r — 2 
n + r 

Next, using the fact that ^ 


+ 


^r + l 4 tr(5+^) 

n + r n + r tr^(S'+) 

2 


t 


n — r — 2 4 tr(5+^) 

n + r tr^(S'+) n + r tr3(5+) 




a 1 

dh llt+(S+) 

2 difl 

n ^ dlk 
k=i 


dlk lkt+{S+) 


+ 


llt+is+) ^ llt+is+) 


and that 


l'tt+(S+} + i^tr+S+)’ 

2 d n 

n ^ dlk n + r ^ 
k=l 


we find 


2^d_ 

n ^ dlk 


k=l 


r + 1 1 4 1 

n + r tr (5+) n + r 4tr2(S'+) 


nt 

n + r 
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1 V — 

n ^ dlk 
k=i 


n — r — 2 


n 


+ r /fctr^(S'+) n + r/^tr^(5+) n + r 


nt^ 


2 ^ n 

n n + r 
fc=i 


+ ^E 

n 

fc=i L 


- 2 


r + 1 


n 


+ rZ^tr2(S'+) n + r/^tr2(S'+) 


+ 


nt 


+ 


2r 


n + r lltT^{S+) n + r 
n — r — 2 2 


9 

+ -E 

n 


fc=i 


n — r — 2 


1 


n + r /^tr^(S'+) 
4 3 

+ 


n 


+ 


+ r Z|tr3(S'+) n + r/|tr3(S'+) n + r/|tr"^(5+)_ n + r 


nV 


+ 


n+r n+r 
2 


^r + 3tr(S'+2) ^ 8 tr(5+3) 


n 


+ rtr2(5+) n + rtr3(S'+) 


n + r 


n — r — 2 tr(5^^) ^ ^n — r — 6 tr(5''“^) ^ 12 tr(S''''‘^) 


n + r tr^ (5+) 


n 


+ r tr^ (5+) n + rtr'^(S'+) 




Finally, using that ^ 

-E 

r? ^ 




kj^b Ik-h 
r 


n 1 
n ^ n + r n 


< 0 and Efc/f, ^ 

4 1 


< 0 we can bound 


— r 

E ht _^ 

4 - k 


nt 

n + r 


1 

H— 
n 


n — r — 2 


1 


EjlE.g. ^ 


1 


■E 


IFT^-i 


-2 


< - 


n + r tr2(5+) ^ h - k n + rtr3(5+)^ h - h 
r(r - 1) 


nt^ 
n + r 


n + r 


Hence the URE (|2.4l) equals 


U = p + 


n — r — 2 


n 


^ 1 + i 

1.1. n. Fil l. n. 


k=l 


V’fc - i’l 


n — r — 2 

< p -r — 


+ 

+ 

+ 

+ 


n + r 
n — r — 2 


n + r 


n f-f dlk 

k=l 
r — 1 




n + r 
2 

n + r 
n — r — 2 


n + r 

4 tr(5+2) 


t 


n + r 
2 


r + 1 

_ 2__I_ 

n + r n + rtr^(5+) 

r + 3 tr(S'+^) 8 tr(5'''^) 

n + r tr^ (5+) n + r tr^ (5+) 

n — r — 2 tr(5'''^) ^ 4 tr(S'+^) 


t 


n 


+ r tr2(S'+) n + r tr^(5+) 


n + r 


n — r — 2 tr(5''“^) ^ ^n — r — 6 tr(S^^) ^ 12 tr(5'''^) 

' H” " ^ ^ Q / ^ \ \ H” 


n 


+ r tr^ (5+) 


n 


+ r tr^ (5+) n + rtr^(S'+) 
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in — l)r 

= P-- -^ + 


n + r 


^ (n — r — 2)(r + 1) ^ ^ n — 2r — 5 tr(5+^) 

■" To “r 4“ 


(n + r)2 


(n + r)2 tr^(5+) 


+ 


16 tr(g+3) 

(n + r)^ tr^ ('S'"*') J 


t + 


[n — r — 2){n — r — A) tr(5’''^) 


(n + r)^ 


tr2(5+) 


n — r — 4 tr(5+^) 24 tr(5+^) 

(n + r)2 tr^(S'+) (n + r)^ tr*^ (5+) 


t\ 


Now note that tr(S'+^) < tra ( 5 +^)tr 2 (5'+^) < tr(5+^)tr(5+) and tr(5+^) < 
tr^(5+®)tr5(S'+^) < tr(S'+^)tr(5+) < tr(S'+^)tr^(5+). Then since r < n —4 


and — 1 < we can write 


rr ^ (n — l)r 
U < P-- - + 


n + r 


^ (n — r — 2)(r + 1) tr(S'+^) ^ ^ n — 2r — 5 tr(S'+^) 

" / \r» .O/^iN H~ ^ 


(n + r)2 tr2(5+) (n + r)^ tr2(S'+) 


+ 


16 tr(5’''^) 

(n + r)2 tr^(S'+) 


t + 


{n — r — 2)(n — r — 4) tr(5’''^) 


(n + r)2 


tr2(5+) 


n — r — 4 tr(5+^) 24 tr(S'+^) 

(n + r)2 tr2(S'+) (n + r)^ 11^(5+) 


(n — l)r 

< P ~ 1“ 

n + r 


(n —r)(n —r + 2) 2 — r){r — 1) 


(n + r)2 

Now, using that < 1 we find 


tr(5’''^) 


(n + r)2 Jtr^(5+)' 

(5.1) 


E 


n — r — 2^fc 2 dif^k 1 V’fc ~ V'b 

^ ^ n Ik n ^ dlk n ^ Ik — h 

k=l k=l k^b 


= E 


n — r — 2 

p+ ... - E 


n + r 


fc=i L 


1 + 


4tr(5+) 


+ - 


n + r 


E 

k=l 


1 + 


/^tr2(S'+)J n + r 


+ 


1 ^ 

— El 

_|_ r ^^ 




n — r — 2 , . 2 , , r(r — 1) 

p H- , (r + t) H-(r + t) + 


E 


E 


E 

k=l 

r 

E 

k=l 


Vjfc 

^k 

Vi 

^k 


n + r 


= E 


n + r 


n + r 


< 00 , 


n 


n + r 


E 

fc=i ■- 


1 + 


4tr(5+) 


n 


n + r 


|r + t| < 00 , 


n 


n + r 


E 


—r + 


^r + l 4 tr(5’''^) 
n + r n + rtr^(5+)J 


t 
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< 


n 


n + r 


, r + 1 4 

r+ 2-+ 


n tr(5+^) 4 tr(S'+^) 

n + r tr2(S'+) n + r tr^(5+) 
n 4 




n+r n+r 


+ 


n + r n + r 




< oo 


and by (|5.1|) 
E 


U 


(n — l)r 

< P H ~ I" 
n + r 


{n - r){n - r + ^ (w - r)(r - 1) ^' 


< oo. 


(n + r)2 (n + r)^ 

Thus all the regularity conditions of Theorem are satisfied, and we find 

(n — ^^r r(n — r)(n — r + 2) 2 (n —r)(r —1) 

— t —2-;-tt ;—t E 


E 


tr( 

(n — l)r 
n + r 


< P-- 


+ 


(n + r)^ 


{n + r)^ 


tr(5+2) 


_tr2(,S+)J’ 


which proves inequality 


since E 


tr(5'+^) 

tr^(5+) 


To minimize this upper bound, notice that 


> 0, it is enough to minimize the quadratic coefficient 


- 2 ilhTK!^t. This is achieved precisely when t = 

When r > 1, this makes this quadratic coefficient strictly negative, which in 
view of Proposition guarantees 


E 


tr 


([Shf2S+ 


-h 


in — l)r ^ 

<p - - - — = E 

n + r 


tr - Ip 


Thus in this case Shf 2 dominates Shfij as desired. 


□ 


5.3. Proofs of Subsection \2.3 

Proof of Theorem^ Since T and S share the same non-zero eigenvalues, we 
can regard T as a function of T ~ Wr{n — l,Il/n) only. Since r < n — 1 
and n is full rank we can apply Lemma 2.1 from Dey (1987). However, 
the proposition is given without proof and, more importantly, without the 
implied regularity conditions that inevitably come from using Stein’s and 
Haff’s lemmas. Eor completeness, we therefore derive again this result in 
our context. Eirst, we can write 


E[||OiTOj - = E[||t/Tt/* - n-^fp] 

= E[tr([/4'2[/i) - 2 tr ( 0 -^ 174 ' [/*)] -htr(H-2) 
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By Lemma 3 of Chetelat and Wells (2014), this equals 
= E 




k=l 


n-r-2^ 2 


n 


^ di^k 1 V’fc ~ '4’b 

n dlf^ n l]^ — Ih 

k=i k=i k^b 


= E 


7^ 7^ T* T* 

E 2 n — r — 2 ^pk 4 ^^|^k 2 V’fc ~ V’b 


k=l 


i, 1 ^ u ^ 'bl , ,, Ik ^b 

k=l k=l kf^b 


+tr(0 

+tr(n“^) 


under the regularity condition 
E 


n — r — 2 V’fc I 2 ^^jJk 1 V'fc ~ V’fe 

■‘4—^ Ij^ n dlk n ■‘4—^ Ij^ — 

k=i fc=i '' k^b 


n 


< oo. 


The result follows from the fact that tr(n = 

tr(S-2). □ 

Proof of Proposition^^ We have ifk = a/lk, so 

^ V'fc = ^ 4 = a2tr(5+2) 

k=l ^k 


k=l 

fc=l ” fc=l 


n 


n 


4 dipk _ _ 4 ^ 
n^^dlk n“^ ll n 


'E -4 = ^otr(S+2) 


2 V'fc ~ V’fe _ 2 

^ h. — /i, n ^ 


4 lb 


^ l~^ — l~^ 2 2 

" = -atr2(5+) - -atr(5+2). 


n Ik lb 


n 


n 


Summing everything, we get the URE 


U = —atr^(5“'“) + (• 
n \ 


2 n-r-3 

a — 2 - 

n 


aj tr(5’''^). 


Now notice that 


E 


n — r — 2 ifk y^ difk 1 y^ V'fc ~ 'f’b 
ll. 11. fill. n 


n 


= lal E 


k=i ^kfb 

^~^~^ r(g+^) - U?{S+) 
n n 


< ^ |a| E[tr(5+2)] + i|o| E[tr2(,S+)] 
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Kollo and von 


Since T ~ Wr{n — l,n/n), by Theorem 2.4.14 (vhi) from 
Rosen| ( |2006[ ) we have the bound 

E[tr(5+2)] < E[tr2(5+)] = E[tr2(T-^)] < cx) (5.2) 

when n — r — 4 > 0, which holds since r < n — 5. Therefore, the regularity 
condition hold and we can apply Theorem to conclude that 


E[||a5+-S+|||] =E[t/] 

= ^aE[tr2(5+)] + - 2 


n — r — 3 


n 


a] E[tr(5+2)] 


for any a G M. Thus, in particular, the risk of the unbiased estimator 
n-r- 2 ^ must equal E[tr^(S'+)] — E[tr(5+^)]. When 

a < 


I ^ we can bound 


E[||o5+-S+|||.] -E 




n 


0 - " ^ ]E[tr^(5+)] 


n 


+ a^-2- 


n — r — 3 


-a + 


(n — r — 2) (n — r — 4) 


= - I a - 
n 


+ ~ 

< 1 a- 


n 

n — r — 2 
n 

n — r — 2 




E[tr(5+2)] 


E[tr2(5+)] 

n — r — 4 


n 

n — r — 2 


n 


a — 


a - 

n 

n — r — 6 
n 


E[tr(5+2)] 


E[tr(S+2)] , 


which shows inequality 
which yields 


2.6 


This upper bound has a minimum at a = 


n—r—4 


E 


|a5+ -S+l 


-E 




Thus 


n—r—4 


S~^ dominates 


n 


n—r—2 


< __E[tr(S+=)] < 0. 


S~^, as desired. Moreover, the URE of S~^ 


is ^tr^(S'+) — ^ ^ ^(5+^) and so 


E 


,n — r — 2 


n 


= E 


5+-S+III 


-E 

5+-S+ 

2 ■ 



F 


-2 


^2^^, _ (r + 2)(r + 4) , 


< 0 , 


so 


n—r—2 


S~^ dominates S~^, as claimed. 


□ 
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Proof of Proposition^^ We have ifk = ci[l/lk + ttr ^(5)], so 


2 r 


k=l 


1 

To + 


2t 


+ 


k=i i^l htr{S) tr(S) 

(!illlZLil!tr(5+2)+ o (^-^-4)^ ^ tr(g+) ^ {n-r-A^r ^^ 


- 2 - 




n — r — 2 




n 


tr(5) 

(n — r — 2) (n — r — 4) 


tr2(5) 


k=l 


^k 




E 


k=l L 


To + 


4tr(S) 


^ ^ (n-r-2)(n-r-4) _ Jn - r - 2){n - r - A) ^ tr{S+) 

,^2 ^2 tr(*S*) 


4 dipk _ 


n‘ 

n — r — A 




1 


t 


Sl il 


k=l 

n — r — A ,„, 9 , , (n — r — 4)r 1 

= 4-^-tr(5+2) + 4^^-^- ’—t 






tr^{S) 


fjk - fJb 


ET-^ = -2 


n k^b 


A ^ ;-i 7-1 

-lb 




k^b 


Ik 


72^ 

Summing everything, we get the URE 

^ ^ _ („-r-4)(n-r-2) , 


+ 4 


n — r — A 


1 


tr(S'+) 
[’ tr2(5) tr(5') 


(n — r — 4)^r o 1 
t + ^ 




tr2(5)' 


Now note, using tr ^(5) < tr(5+)/r^ and tr(S+^) < tr^(S’'') that 


E 


n — r — 2 'ifk ^2 y^ d'lfk ^ 1 y^ V’fc ~ f^b 

" " iS ®'‘ '» 

_ - r - 3)(„ - . - 4) j.[t,,s+2)] + E[tr4(S+)] 






r — 2)(n — r — 4) 

tr(5+) 

o 

1 

1 

-K 

1 

9 t E 

[ tr(5) J 


tr2(5) 


< 


+ 


(n — r — l)(n — r — 4) (n — r — 2)(n — r — 4) 

H :t o hi 






+ 2 - 


,n — r — 4 

r^fp 


|t|)E[tr2(S+)] 


< oo, 
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since E[tr^(S'^)] < oo by equation (5.2). Therefore, we can apply Theorem 
[ 2 ] to obtain 


E 


S+ - S+ 


= 2 - 


F 

n — r — A 


= E[U] 


n 


^ E[tr=(S+)] - E[tr(S«)] 




n — r — 4 „ 
+ 4-^-tE 




1 


tr(S'+) 
' tr2(5) tr(5) 


(n-r-4)^r ^2g 






for all f G M. Using that n — r — 4 > 0 and r^tr ^{S) < tr(S'+) again, we 
can bound the difference in risk as 


E 


S+ - S+ 


-E 


v+ _ v+ 
^EMl ^ 


< 


(n — r — 4)r 




(n — r — 4)f^ — 4(r — l)f 


E 

1 


[tr2(5)J 


which proves inequality (2.7). There is a minimum in t since n — r — 4 > 
0, which is f = In this case the quadratic coefficient and thus 

the difference in risk is strictly negative, so the corresponding estimator 
Eem 2 = + 2 tr~^(5') dominates as desired. □ 


5.4- Proofs of Subsection 2-4 


Proof of Theorem^ Since T and S share the same non-zero eigenvalues, 
we can regard T as a function of T ~ Wr-(n — l,U/n) only. Moreover, 
X = fj, + HY. Using that OiO\ = HH* almost surely, we find 


E 


E+X - E+/r 


= E 


= E 


2 

t \ Tjt 


+ U] - 


OiO\OiAiO\OiO\[^i + HY] - 
2 


Define G = H^^x + Y ~ Nr(i7*//, D/n) and notice it is independent of t/Tt/* 
since X and S are independent. Then 

2" 

2 

= 2E[(G - - 2E[tr(U-it/TU*GG‘)] 

+ e[g^ua!‘^u^g] - e[(g - H^nYn-\G + • 


= E 


uA>ww - 
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The first term can be handled as follows. By independence of G and U^U^, 
and Stein’s lemma (Fourdrinier and Strawderman, 2003, Lemma A.l), we 
get 


2E[{G - H^uYn-^UW^G] = Eg 


n 


{G - 


-1 


Et [U^UY G 


= -Eg \VgG'Et [C/TC/*11 = -Etr [T] 
n n 


under the condition 


Eg 


VgG'Et [C/T17*] 


= Eg 


tr(T) 


= E 


k=l 


< OO. 


For the second term, we will make use of the fact that 

Et [n-^U'i’UY = Et [C/T*C/*] , 


(5.3) 


where T* is defined as the statement. This is the result of a non-singular 
analogue of Theorem 2.2 from Konno (2009), or alternatively of a matrix 
analogue of Lemma 3 from Chetelat and Wells (2014). By appropriate mod¬ 


ifications to the latter result and the underlying Lemma 3 from Chetelat and 


Wells (2012) on which it depends, it can be seen that sufficient conditions 

< OO VI < < r. 


for equation 5.3 to hold are 


Et 




*TTt\ 


\IJ 


A sufficient condition for this to happen is 


max Et 






< E 




Lfc=i 


< OO. 


Then, using the independence of G and T, we can conclude 

- 2E[tr(0-^CT[/^GG*)] = -2tr(ET Eg [GG*]) 

= -2tr(ET [WG*] Eg [GG*]) = -2E[G*GT*G*G] . 


Thus 


E 


S+A - 




= ^E[tr(T)] -2E[G*GT*G*G] +E[G^U^^U^G] 
- E[(G - H^^iYVL-‘^{G + • 
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But U^G = 0\H[H^n + Y] = OjX and (G - + H^fi) = 

{G - H^fiYH^Y+^H{G + H^fi) = {X - ^)*S+2(X + fi). Hence 


E 


E+X 



= E 


-trS+ + X*Oi(^'^ 
n 


2^*)0iX 


-E[(X-^)*E+2(x+ /.)]. 


This proves the result. 

Proof of Proposition^^ We have ifk = «/4) so 


□ 


n-r-2ipk , 2 difk , ^ sr^i^k-'if h 

iPk = -;—I- ^ ^ 

n Ik n dlk 'nfrih- k 

b^k 


^k 

n — r — 2 1 


n 


2 1 1 tr(5+) 1 1 

ll^ nll^ n Ik nll^ 


k k 

n — r — 3 1 1 tr(S'+) 


We can bound 


E 


k=l 


= |a|E[tr(5+)] < |a| E[tr2(5+)] S 


E 


E 

Lfc=i 


< -|o| E[tr(5'''^)] H—|a| E[tr^(S'^)] , 


so by inequality (5.2) and the fact that n — r — 4 > 0 these two expressions 


are finite. Therefore, we can apply the results of Theorem to obtain 


E 


aS+X - E+^ 
+ E 

- E 


= —aErtr(S'^)l 

n 




n 


kk 


(X-,.)‘E+ 2 (X + ,.) 


= ^»E[ti(S+)] + (a^ - 2 " I \ ) E[.?‘S+=.?] 


2 ^ 
H —a E 
n 


triS+)X^S+X -E (X-/i)*S+2(X + ^) 
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for any a G M. Therefore, for a < ^ ^ we can bound the difference in 

risk by 
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aS+X - S" 




-E 




n 


= - I a - 
n 


n — r — 2 


n 


E[tr(S+)] 




H— 1 ~ 

n 


n 

n — r — 2 




n 


E 


tr(5+)X*5+X 


< a — 


n — r — 2 


n 


a - —- ) E[X^S+^X] , 


which proves inequality (2.8). The quadratic coefficient is minimized at 


a = 


n—r—3 


, at which point we have 


E 


- EV ' 


n 


- E 


n — r — 2 


S+X 


n 


< -^E[X^S+‘^X] 


< 0 . 



2 

2 


Thus ^ ^ S^X dominates ” ^ ‘^ S~^X, as desired. Moreover, 

n n ’ ’ 


E 


,n — r — 2 


n 

r + 2 


S+X -E+X\\l 


- E 


S+X - E+X 


= -2^^E[tr(5+)] - ^ ^ E[X^S+‘^X] 


n 

r + 2 ^ 
- 2 —^ E 




tT{S+)X^S+X 


< 0 , 


SO 


n—r—2 


dominates , as claimed. 


Proof of Proposition We will applyj^ and we have here V’fc 
ttr“^(5)] for 1 < A: < r, so 

n — r — 2'ifk (n — r — 2){n — r — 3) 


□ 

^[1/4+ 
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2 _ ^ n-r -3 

n ^ dlh 
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Jl /fctr(5)J ’ 


1 t 


1 f^k ~ V’fc 
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Therefore 
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Ik 
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We can bound 
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E 


E 
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< 


{n — r — 3)^ 




E[tr(5+2)] + 


n — r — 3 




E[tr2(5+)] 


(n — r)(n — r — 3), , 

+ - -- -\t\ E[tr-2(5)] , 




sobytr”^ <tr(5"'“)/r^, inequality (5.2) and the fact that n — r — 4 > 0 these 
two expressions are finite. Therefore, we can apply the results of Theorem 
[3] to obtain 
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n — r — 3 




E[tr(S+)] +2 


[n — r — 3)r 




fE[tr-^(,S)] 
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^^ (n-r-3)M ^ ^ (n-r-3)^ ^tr ^(S) 
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-t 




^k 
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= E [tr(S+)] + E [tr-‘ (S)] 
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+ E 
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“r " " 
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= 2 - 


n — r — 3 




E[tr(5+)] - 


{n — r — 3)^ 




E[X^S+^X] 


+ 2 - 


n — r — 3 


n^ 


E[tr(5+)X*5+X] + ^^'’ E[tr-^(5)] 


n — r — 3 ^ 
— 2-^-E 


(n — r — 3)^ 2 ^ 
+ ^- -t^E 


X^S+X 
[ tr(5) j 
X^X 


n — r — 3 ^ 
+ 4-^-E 




X^X 

Lt^J 


- E 


{X - irfE+\X + 


n^ [tr2(5)J 

for any f G M. Therefore, the difference in risk can be written 
|2 
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Vt-'T] 


-E 


n 


= 2 


,(n-r-3)r . , 


, n — r — 3 ^ 
+ 4-^-E 


E[tr-^(5)] -2 
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n — r — 3 
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Ltr2(5)J 


(n — r — 3)^ 2 




X^X 


Ltr2(5)J • 


Buttr(XX*) = tr(,S5+XX*) < tri(52)tr5 ([5+XX*]2) < tr(5)tr(5+XX*), 
so we can bound 


< 2 (" -y-3)’- tE[tr-l(S)]+2 " ~’:~^ E 






X^X 

Ltr^J 


(n — r — 3)2 2 ^ 
+ 2-^- ^-t^E 
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X^X 

tr2(5) 


Next, write the reduced singular value decomposition of X as ^/nVlL^^‘^Ol 
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with Vi n X r semi-orthogonal, V^Vi = Ir- Then 


X^X = tr(X 


t 

9 


X] =ti[LV, 


1 1 * 


'1 


n 


Vi 


< tr(L)amax( ) < tr(S')o-„ 


1 1 * 
n 


= tr(5). 


Therefore, we can bound by 
(n — r — 3) 


< 




2(r -|- l)t -|- (n — r — 3)r 


E 


_tr(5)J ’ 


which proves (|2.9|). Since n — r — 3 > 0, the quadratic coefficient has a 
r+l 
n—r—3 

-1 


minimum, at t = — In this case we have 
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n — r — 3 


< - 


n 

(r 1) 


g+ (r- + l)tr ^{S) 
n — r — 3 


■ 

2 

r 

X -7] 

2 

-E 


" - - =^ 5+;? - ri ^ 


n 


E 


tr(5) 


< 0 . 


Thus 7 ?tk2 = ^ ^ ^ •S’"'' — T^itin tr ^{S) dominates r/TKij as desired. □ 
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